-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4253]Ignore spark.driver.host in yarn-cluster and standalone-cluster mode #3112
Conversation
Test build #22939 has started for PR 3112 at commit
|
Test build #22939 has finished for PR 3112 at commit
|
Test PASSed. |
does this only occur if user manually sets spark.driver.host ? |
Yep. Considering setting this config item in yarn-client mode is ok and switching between yarn-client and yarn-cluster mode is convenient, I think it would be better to ignore it in cluster mode. |
I'm just curious what is the reason to set it for client mode? Note that in yarn-client mode the ApplicationMaster handles setting this so I"m wondering if there is some other usecase where you need to manually set it. |
If we didn't set it, Yarn cluster would probably not recognize driver's host (default value of spark.driver.host) because the nodes in yarn could not just resolve its hostname. Setting it to driver's ip would help. |
Sorry perhaps my questions was misleading. Are you as a user setting spark.driver.host for some reason? The user doesn't need to set it in yarn mode (either client or cluster). SparkContext sets it for you as the localhost and that gets propagated to the executors. I'm not all saying this isn't a good change to protect users from doing bad things I'm just wondering why it was set in the first place. Perhaps there is some use case I'm not aware of. |
Here is the scenario: When a client submit an application(in yarn-client mode) on a node which is not in yarn cluster. The ApplicationMaster may not recognize the client's hostname like this:
I think it happens when Application Master could not resolve client's hostname(because the client's hostname-ip pair was not added into /etc/hosts or DNS server). If we set That is to say, actually in yarn-client mode this configuration is useful sometime. |
Additionally, I think spark.driver.host is useful in all client mode including standalone, mesos mode ( i don't know it very mych) and yarn-client mode. When the cluster cannot resolve client's hostname we must set this configuration to client's ip address to avoid failure to connect to driver. If i understood it wrong, correct me please. |
Ok, I understand the use case for it in yarn client mode. |
I don't think that this problem is limited to YARN; I think that I've also seen when submitting jobs to a standalone cluster using When using a cluster deploy mode, whether through YARN or Spark's standalone cluster manager, I think we should completely ignore any values of |
@JoshRosen Sorry for just checking the 1.1.0 version(where only client mode is supported in standalone mode) not the latest documents. Now I ignore |
Test build #23198 has started for PR 3112 at commit
|
Test build #23200 has started for PR 3112 at commit
|
|
||
// Set the web ui port to be ephemeral so we don't conflict with other spark processes | ||
// running on the same box | ||
conf.set("spark.ui.port", "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change? We already have logic that tries to bind to a higher-numbered port if the desired port is unavailable.
In Spark Standalone If we do the ignoring inside of SparkContext itself, then this might cause problems later down the road if the remote container wants to inject some machine-specific values for these settings. Instead, I think that the right way to fix this is to stop the inappropriate sending of machine-local properties to remote machines. Essentially, I think this fix should go into SparkSubmit rather than SparkContext. |
|
Test build #23198 has finished for PR 3112 at commit
|
Test PASSed. |
Ohh..I got what you mean. It is indeed inappropriate to set |
Test build #23200 has finished for PR 3112 at commit
|
Test PASSed. |
Test build #23213 has started for PR 3112 at commit
|
Now we ignore it in SparkSubmit. |
Test build #23213 has finished for PR 3112 at commit
|
Test PASSed. |
@JoshRosen Is it ok? |
Test build #23313 has started for PR 3112 at commit
|
Test build #23313 has finished for PR 3112 at commit
|
Test PASSed. |
I have an integration test framework that has a reproduction for the case where I originally observed this problem, so I'd like to try to run your PR there to verify that it fixes the issue. Super minor nit, but do you mind reverting those unrelated formatting changes? |
Test build #23347 has started for PR 3112 at commit
|
@JoshRosen I have reverted the dot which I think is produced in modifying comments. And the blank between |
Test build #23347 has finished for PR 3112 at commit
|
Test PASSed. |
I haven't forgotten about reviewing this; while attempting to run my integration test to see whether this fixed my bug, I uncovered a potential regression (unrelated to this patch) in the master branch (and the current 1.1.1 RC): https://issues.apache.org/jira/browse/SPARK-4434 Once we fix that issue, I'd like to try re-running my test. |
@JoshRosen I have tested on my cluster using on yarn mode, here is my result: *before this patch
*after this patch
Haven't tested on standalone mode, but I think we will get same behavior. |
…cluster modes In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches. This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode. Author: WangTaoTheTonic <[email protected]> Author: WangTao <[email protected]> Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits: ed1a25c [WangTaoTheTonic] revert unrelated formatting issue 02c4e49 [WangTao] add comment 32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext 667cf24 [WangTaoTheTonic] document fix ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode 2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode (cherry picked from commit 8106b1e) Signed-off-by: Josh Rosen <[email protected]>
Thanks for testing this out! I also tested it with my own integration test and it now passes, so this looks good to me. I'm going to merge this into |
I've also backported this to |
…cluster modes In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches. This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode. Author: WangTaoTheTonic <[email protected]> Author: WangTao <[email protected]> Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits: ed1a25c [WangTaoTheTonic] revert unrelated formatting issue 02c4e49 [WangTao] add comment 32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext 667cf24 [WangTaoTheTonic] document fix ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode 2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode (cherry picked from commit 8106b1e) Signed-off-by: Josh Rosen <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
And I see description in configuration document is "This is used for communicating with the executors and the standalone Master.", should we modify it together?